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Abstract 

This article explores the role of text cohesion in the comprehension and production of text. While 
most discourse models have considered the roles of the text features and the reader, the crucial role 
of writers’ epistemic stance has not been widely considered. The thesis explored here is that levels of 
cohesion emerge in text based on the epistemic stance of the author relative to the reader. Evidence is 
provided indicating that text genres (i.e. science, narrative) show compensatory relationships between 
different features related to text difficulty. For example, while science texts have more challenging 
words than do narratives, they tend to have higher cohesion and simpler syntax. These text profiles 
indicate that skilled writers have an awareness of readers’ needs. By contrast, less skilled writers seem 
to have less sensitivity to the interplay between textual dimensions and less audience awareness. 
For example, evidence is reviewed showing that more proficient essays are characterized by lower 
cohesion than less proficient essays: less skilled writers tend to use more cohesive cues (when they are 
likely unnecessary) than do more skilled writers. To the extent that an author understands the readers’ 
needs, the author has a more successful epistemic stance toward the reader. This stance is partially 
evidenced by the crucial role of cohesion in text comprehension and writing. 


Keywords 
Coherence, cohesion, epistemic stance, reading comprehension, text genre, writing, 
writing proficiency 


Cohesive cues are essential elements of text. They bind together clauses, sentences, and 
paragraphs. They provide the semantic glue to text by specifying the relations between 
the elements. Without these cues, readers without sufficient prior knowledge about the 
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text domain often fail to comprehend the text. Text comprehension crucially depends on 
text cohesion. 

Writing researchers and educators have historically made the same assumptions about 
text production and text quality. Many have assumed that better writing is characterized 
by higher cohesion and the use of more cohesive cues. Indeed, there is an intuitive sense 
that higher quality writing is more coherent, and this coherence is intrinsically tied to 
cohesion. Hence, there has been a general assumption that the same cohesive cues that 
facilitate text comprehension are also related to enhanced judgments of text quality. 
However, the research reviewed in this article has found just the opposite: this research 
has indicated that judgments of the writing quality are in some cases unrelated to cohe- 
sion and in others, negatively affected by cohesion. Low cohesion writing is often judged 
as higher quality in comparison to writing with more cohesive cues. 


Cohesion facilitates comprehension but lowers 
judgments of writing quality 


Why would the presence of cohesion be so crucial for text comprehension but its absence 
lend to perceptions of higher writing quality? This article explores a potential answer to 
this apparent paradox. The bases of this account lie in joint considerations of the know/l- 
edge demands of the text as well as the epistemic stance of the author toward the intended 
audience or reader. The importance of cohesion is examined in relation to the demands 
of the text, both in terms of knowledge and in relation to other features of the text. 
Accordingly, multiple features and dimensions of texts operate in concert with one 
another in relation to the purpose of the text (e.g. genre) and the intended reader. Different 
dimensions of text work to compensate for one another such that when texts are more 
challenging in terms of one dimension, they tend to be less challenging in another. 
Further, the claim is offered here that an author’s epistemic stance toward the reader 
helps to drive these compensatory relationships between features in text, and particularly 
the level of cohesion in the text. This claim is partially supported by variations in the 
levels of cohesion across different genres relative to other text features, and the decrease 
in the use of cohesive cues in text as writers develop. 


Cohesion in text 


Cohesion refers to the degree of semantic overlap between concepts in text or discourse. 
Cohesive cues are grounded in explicit linguistic elements (i.e. words, features, cues, 
signals, constituents) and their combinations (Graesser and McNamara, 2011). Consider 
the following two examples from Haviland and Clark (1974): 


Example A George got some beer out of the car. The beer was warm. 


Example B_ George got some picnic supplies out of the car. The beer was warm. 


The second sentence, The beer was warm, is read more quickly in the context of Example 
A where there is overlap in the referent, beer, in comparison to Example B where there 
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is no common referent between the two sentences. When text is read more quickly, it is 
assumed that the text is easier to process for the reader. Indeed, there are numerous stud- 
ies that have provided evidence that referential overlap impacts reading times and recall 
of words and sentences (e.g. Haviland and Clark, 1974; Kintsch and Keenan, 1973; 
Kintsch et al., 1975). 

Cohesion and cohesive cues are explicit features of text and discourse. Notably, these 
cues are neither necessary nor sufficient to produce a coherent understanding of that text. 
Consider the following two examples: 


Example C George got some beer out of the car. He grabbed the beer. The beer was warm. 


Example D George got some picnic supplies out of the car. Supplies are materials or 
provisions stored and dispensed when needed. The beer was warm. 


In Example C, the second sentence, He grabbed the beer, is unnecessary for most 
readers because it is easily understood that getting the beer out of the car likely involves 
grabbing it. In Example D, the word supplies in the second sentence overlaps with the 
same word in the first sentence, technically providing cohesion, but this overlap is insuf- 
ficient for the reader to construct a coherent representation of the situation. Indeed, the 
information about supplies would likely interfere with the processing of these sentences. 
Hence, cohesion often facilitates comprehension, but cohesion does not have a one-to- 
one correspondence with the coherence of a mental representation. 

Researchers have also gone beyond the sentence reading time approach by examining 
the effects of cohesion in the context of longer, more natural texts closer to the length of 
typical textbook chapters (for a review, see McNamara et al., 2010a). Among the first to 
tackle this issue, Beck et al. (1984) revised two second-grade narrative passages. Their 
revisions targeted a variety of problems which they categorized into three types: surface 
problems, including syntactic complexity, unclear relations between reference and refer- 
ent in the text, and the inappropriate use of connectives; 2) knowledge problems, involv- 
ing readers’ lack of familiarity with the meaning and significance of events, and the 
relations between the events; and 3) content problems, attributed to ambiguous, irrele- 
vant, or confusing content. Their revisions targeted cohesion, but also addressed an array 
of issues that contribute to the coherence of the text for a reader. They found overall 
benefits of the text revisions on third-grade children’s ability to recall the passages as 
well as their ability to answer multiple-choice questions. Beck et al. (1991) later extended 
these findings to fifth-grade children’s comprehension of social studies texts and to a 
wider set of dependent measures. 

Britton and Gulgoz (1991) demonstrated the benefits of adding cohesion to text with 
college student readers. They manipulated a text about the war in Vietnam from four dif- 
ferent theoretical perspectives. Most relevant here is the principled version. For that, 
they focused primarily on increasing cohesive cues from the perspective of Kintsch and 
Van Dijk’s theory of text processing (e.g. Kintsch and Van Dijk, 1978; Miller and Kintsch, 
1980; Van Dijk and Kintsch, 1983). Based on Kintsch and Van Dijk’s model of compre- 
hension, they indentified cohesion breaks in the text where there was no explicit cue on 
how the new information was linked to prior text. To repair these breaks, they added 
referential (i.e. argument) overlap, rearranged parts of each sentence so that readers first 
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received old information (i.e. an idea presented previously in the text) and then the new 
information, or rendered explicit any implicit references that did not have a clear 
referent. 

Britton and Gulgoz (1991) found that the principled revision improved comprehension 
in comparison to all three of the other versions according to their three dependent meas- 
ures (i.e. free recall, multiple choice questions, and a keyword association task). McNamara 
and Kintsch (1996) later replicated the Britton and Gulgoz findings with the same text, 
showing advantages for the principled revision, and further showing that the cohesion 
revisions particularly benefited readers who had less knowledge about the topic. 


Cohesion and knowledge 


To the extent that knowledge plays a role in comprehending a text or discourse, the 
effects of cohesion become more prominent. Knowledge is among the most important 
factors to consider in the comprehension of text and discourse. Readers who have more 
knowledge about the topic of a text or discourse better understand the material (e.g. 
Alexander et al., 1994; Bransford and Johnson, 1972; Chiesi et al., 1979; Haenggi and 
Perfetti, 1994; McNamara and Kintsch, 1996). Readers with more knowledge about a 
domain process the information more quickly, remember more of the information, under- 
stand the information at a deeper level, and even more effectively ignore irrelevant infor- 
mation (McNamara and McDaniel, 2004). Clearly reading skill is important to 
comprehension, but in most cases, the effects of prior knowledge dominate (e.g. Chiesi 
et al., 1979; for a review, see McNamara and O’Reilly, 2009). Early evidence for the 
importance of knowledge was provided by Bransford and Johnson (1972), who demon- 
strated that something as simple as a title can be essential for the reader to activate the 
appropriate knowledge, particularly for ambiguous texts. Without the title of a text about 
washing clothes, the text was nearly incomprehensible. While these texts were manipu- 
lated carefully to show the desired effects, subsequent studies have consistently repli- 
cated the importance of activating knowledge and having sufficient knowledge to 
understand text and discourse. 

What is the nature of the role of knowledge during comprehension? Multiple sources 
of knowledge are crucial to comprehension. First, the reader must access or activate the 
words in the discourse or the text. The reader’s knowledge of the words themselves is the 
first step toward comprehension, comprising what is referred to as the surface structure 
of a mental representation (Kintsch, 1988; Van Dijk and Kintsch, 1983). Indeed, word 
knowledge is highly related to basic comprehension and is a separable construct from 
comprehension skill (Chiesi et al., 1979; Perfetti, 1989). The second role of knowledge 
is in the spreading of activation between related concepts in the discourse. This spread- 
ing activation results in connections between related concepts within the discourse, or 
bridging inferences. This aids a reader in the formation of a textbase. The third role of 
knowledge is to go beyond the text, activating concepts that are not explicit in the text, 
forming links and bridges between concepts within the text and to prior knowledge. To 
the degree that the reader goes beyond the text, the reader forms a richer, more complete 
situation model level understanding. The reader uses knowledge to integrate meanings of 
individual sentences into a coherent representation of situations or events depicted by the 
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overall text (e.g. Kintsch, 1988, 1998). Thus, the situation model refers to the integration 
of the textbase and the reader’s prior knowledge. 

Cohesion gaps in the text increase the demands of knowledge for the reader. A good 
deal of research has demonstrated that the effects of cohesion on comprehension and 
learning become particularly apparent when knowledge comes into play. Consider, for 
example, the following three sets of sentences: 


Example 1 Some animals have the ability to grow back entire body parts lost through accident 
or injury. This process is called regeneration. 


Example 2. Some animals have the ability to grow back entire body parts lost through accident 
or injury. For example, the lizard can grow back his entire tail. This process is 
called regeneration. 


Example 3. Some animals have the ability to grow back entire body parts lost through accident 
or injury. For example, the lizard can grow back his entire tail. The process of 
growing back lost body parts is called regeneration. 


Through this text, it appears that the author intends for the reader to learn the meaning of 
the term regeneration. Many readers will have a sufficient level of knowledge to under- 
stand and potentially remember the term by reading the first set of sentences (i.e. Example 
1). At a minimum, the reader is required to generate a backwards referential inference 
that this process refers to the process of growing back lost body parts. Example 2 also 
requires this inference, but provides an example that allows the reader to more readily 
connect either prior knowledge or a grounded example to the concept. The third example 
scaffolds the reader by providing the example and also relieving the need for the referen- 
tial inference. 

Low knowledge readers comprehend text and discourse best when there are fewer cohe- 
sion gaps, as in the third example (McNamara et al., 1996; for a review, see McNamara et 
al., 2010b). Low knowledge readers are particularly challenged by cohesion gaps because 
knowledge is required to bridge the gaps. The reader who lacks sufficient knowledge strug- 
gles to generate the required inferences and is unlikely to form a coherent mental represen- 
tation of the text. For low knowledge readers, cohesion is often necessary to construct a 
coherent representation of the text, at least at the textbase level. Returning to the prior 
example, the backwards inference that this process refers to the process of growing back 
lost body parts requires knowing that growing back body parts is a process, and that this 
process did not refer to /osing body parts. It further requires some knowledge of the word 
generation and linking that word to growing as well as knowing that the prefix re indicates 
repetition, and hence growing back. Moreover, without any experience of having seen an 
animal grow back body parts, this sentence may seem nonsensical to many children who 
are quite sure that body parts do not grow back when they are lost. Hence, while on the 
surface these two sentences seem quite easy to comprehend, for many readers they may be 
challenging, if not impossible to understand. 

A high knowledge reader who already knows what regeneration means will be rela- 
tively unaffected by the levels of cohesion in the three examples in terms of recall or 
learning. However, a reader who is learning what regeneration means, but also possesses 
the knowledge such as described above, will benefit from reading the lower cohesion 
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version as in the first example. High knowledge readers can benefit from reading low 
cohesion text. When high knowledge readers are not likely to generate inferences on 
their own (e.g. O’Reilly and McNamara, 2007), cohesion gaps induce them to generate 
inferences that benefit learning. Those inferences serve to connect the new information 
in the text with prior knowledge (e.g. experiences of seeing animals that have regrown 
body parts) and to form more connections between concepts within the text. For these 
readers, cohesion is not necessary, and can even potentially interfere with the active 
inference processes that emerge when cohesion gaps are encountered. Comprehension is 
more successful and deeper if the reader activates relevant knowledge and integrates that 
knowledge with the information explicitly stated in the text. This deep, constructionist 
processing on the part of the reader (Graesser et al., 1994) contrasts with minimalist 
processing, where the reader makes few inferences and processes the text at a shallow 
level (McKoon and Ratcliff, 1992). 

In sum, comprehension is enhanced to the extent that the reader generates inferences 
while reading (Vidal-Abarca et al., 2000; Wolfe and Goldman, 2005). However, a suffi- 
cient level of knowledge is necessary to generate inferences. When there are copious 
cohesion gaps combined with insufficient knowledge, comprehension fails. Thus, high 
cohesion benefits low knowledge readers, but low cohesion can benefit high knowledge 
readers, particularly those who need to be pushed to generate inferences (O’Reilly and 
McNamara, 2007). 


Coh-Metrix and measuring language 


The importance of cohesion to comprehension and learning pointed toward the need for 
an automated tool to measure cohesion. Coh-Metrix was developed to meet this need. 
Coh-Metrix is an automated tool that provides estimates of cohesion as well as numerous 
other features of language (Graesser and McNamara, 2011; McNamara and Graesser, 
2012; McNamara et al., 2010b). It was constructed based on theories of text and dis- 
course, most predominantly the Van Dijk and Kintsch (1983; Kintsch and Van Dijk, 
1978) model of discourse comprehension. Accordingly, our goal was to provide informa- 
tion about text corresponding to the different levels of comprehension, such as the sur- 
face structure, textbase, and situation model. Coh-Metrix provides, indices of language 
automatically by combining and integrating lexicons, part-of-speech classifiers, syntac- 
tic parsers, latent semantic analysis (a statistical representation of world knowledge 
based on corpus analyses), and other common computational linguistics components. A 
wide range of measures are provided, including descriptive indices (e.g. number and 
length of words, sentences, paragraphs), word indices (e.g. word frequency, hypernymy, 
polysemy, concreteness), sentence indices (e.g. syntactic difficulty), lexical diversity 
(i.e. the variety of words), referential cohesion (i.e. overlap in words or concepts), con- 
nectives (e.g. because, so, moreover), and indices reflective of the situation model (e.g. 
temporal cohesion, causal cohesion). 

Numerous studies have been conducted using Coh-Metrix (for reviews, see McNamara 
and Graesser, 2012; McNamara et al., 2010a). Coh-Metrix is often used to understand 
the characteristics of texts that other researchers have selected or created in experimental 
studies of text comprehension. We have also used Coh-Metrix in studies of natural 
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language and discourse processing examining a wide range of topics and tasks, such as 
paraphrasing, explaining, answering questions, writing essays, tutoring, and detection of 
deception (McNamara and Graesser, 2012). 

The primary motivation for the development of Coh-Metrix was to augment tradi- 
tional formulas used to estimate the difficulty of text. Readability formulas such as the 
Flesch Reading Ease (Flesch, 1948; Klare, 1974-1975) and Degrees of Reading Power 
(DRP; Koslin et al., 1987) have been used for decades to estimate text difficulty in rela- 
tion to the grade level or reading ability of the reader. One limitation of traditional read- 
ability measures is that they consider only the superficial characteristics of text reflected 
by the length or frequency of the words in the texts and the length or syntactic complex- 
ity of the sentences in the text. These readability measures can provide excellent predic- 
tors of sentence level understanding and the amount of time it takes to read a passage. 
However, while they successfully predict readers’ surface understanding in terms of their 
understanding of the words and of individual sentences, readability formulas fall short in 
aligning with deeper levels of comprehension. Moreover, it is clear that there is more to 
language than the words and separate sentences. 

To provide estimates of other sources of difficulty in text, Graesser et al. (2011) con- 
ducted a principal components analysis (PCA) on Coh-Metrix indices for 37,520 texts 
in the Touchstone Applied Science Associates (TASA) corpus. PCA reduced the large 
multivariate database to eight functional dimensions that accounted for 67.3% of the 
variance between texts. These dimensions were well aligned with theories of text and 
discourse. Of these eight (narrativity, syntactic simplicity, word concreteness, referen- 
tial cohesion, deep cohesion, verb cohesion, temporal cohesion, logical cohesion), the 
first five components accounted for 54% of the variance and were most closely aligned 
with the construct of text difficulty. These five are described below. 


1. Narrativity. Narrative text tells a story, with characters, events, places, and things 
that are familiar to the reader. Narrative is closely affiliated with everyday, oral 
conversation. This component is highly affiliated with word familiarity, world 
knowledge, and oral language. Non-narrative texts on less familiar topics lie at 
the opposite end of the continuum. They contain more information, and more of 
that information will tend to be unfamiliar to some readers. 

2. Syntactic simplicity. This component reflects the degree to which the sentences 
in the text contain fewer words and use more simple, familiar syntactic struc- 
tures, which is generally less challenging to process. At the opposite end of the 
continuum are texts that contain sentences with more words and use complex 
syntactic structures. 

3. Word concreteness. Texts that contain content words that are concrete, meaning- 
ful, and evoke mental images are easier to process and understand. Abstract 
words represent concepts that are difficult to represent visually. Texts that contain 
more abstract words can be more challenging to understand. 

4. Referential cohesion. A text with high referential cohesion contains words and 
ideas that overlap across sentences and the entire text, forming explicit threads 
that connect the text for the reader. Low cohesion text is typically more diffi- 
cult to process because there are fewer connections that tie the ideas together 
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for the reader. However, if the reader has sufficient knowledge, then the 
required inferences in low cohesion text may benefit comprehension. 

5. Deep cohesion. This dimension reflects the degree to which the text contains 
causal and intentional connectives when there are causal and logical relationships 
within the text. These connectives help the reader to form a more coherent and 
deeper understanding of the causal events, processes, and actions in the text. 
When a text contains many relationships but does not contain those connectives, 
then the reader must infer the relationships between the ideas in the text. If the 
text is high in cohesion, then those relationships and global cohesion are more 
explicit. 


These components paint a multi-dimensional picture of text difficulty. They convey 
the fundamental notion that the challenges within a text emerge from different aspects of 
the text. As depicted in Figure 1, multiple aspects of language affect what can be experi- 
enced as the difficulty of the text. By contrast, a unidimensional model of text difficulty 
assumes that word and sentence challenges combine to produce difficulty. 

While their simplicity and alignment with grade level is appealing to many, unidimen- 
sional representations of comprehension are unsatisfactory for a number of reasons. 
First, unidimensional representations of comprehension ignore the importance of read- 
ers’ deeper levels of understanding. Traditional readability measures do not tap the more 
global levels of discourse meaning, cohesion, and differences between text genre (e.g. 
narrative vs informational texts). Most comprehension models (Graesser and McNamara, 
2011; Kintsch, 1998; McNamara and Magliano, 2009; Van Dijk and Kintsch, 1983) pro- 
pose that there are multidimensional levels of understanding that emerge during the com- 
prehension process, including (at least) surface, textbase, and situation model levels. 

Unidimensional models also give no information about the genre of a text. Whether 
the reader can develop a global, deep-level comprehension of the overall text meaning 
is greatly affected by text genre (e.g. narrative vs expository), primarily because of the 
relationship between knowledge and genre. Narrative texts usually present reoccurring 
topics and events (e.g. friendship, love, travel, death) in specific contexts involving 


Word 


Concreteness 


Deep 


Figure |. Multiple dimensions contributing to text difficulty. 
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particular characters, settings, and times. Readers generally have had a vast amount of 
experience related to the events and situations described in typical narrative texts. The 
purpose of a narrative is to create a novel story around generally familiar people, 
places, events, and things. By contrast, the purpose of expository texts (e.g. science 
texts) is to present new information that is to be learned by the reader. Expository texts 
usually present specific facts and relations between those facts to provide the reader 
with information about concepts and events. Importantly, and by design, readers are 
likely to be relatively unfamiliar with much of the text’s content. Thus, whereas read- 
ers may easily draw on background knowledge to comprehend narrative texts, they 
may struggle to develop situation model representations of expository texts as they 
lack necessary background knowledge. 

In addition, the relative challenges of expository and narrative text lie in their relation- 
ship to the reader as well as what the reader must do with the text: the reader learns from 
the text or enjoys the text. Learning from text requires activating and connecting to prior 
knowledge. However, unidimensional models ignore complex relationships between the 
characteristics of the text and the reader’s individual differences. For example, a reader’s 
prior knowledge will interact with some aspects of text such as narrativity, word concrete- 
ness, and cohesion. However, reading skill will have larger interactions with others, such 
as syntactic complexity. Without separate measures of the different characteristics of text, 
predictions of how well an individual will comprehend a text are limited. 


A dance among linguistic factors 


Another reason that the difficulty or ease of a text should not be assessed unidimension- 
ally is because multiple levels often work to compensate for one another (McNamara 
et al., 2012a). When texts are challenging in terms of one dimension, they will tend to be 
easier in another. Texts will rarely have challenges at all levels of difficulty. Consider for 
example an informational text, with unfamiliar, abstract words, long complex sentences, 
and with few explicit connections or cohesive cues between sentences. At extreme lev- 
els, this would be a relatively unnatural text. For example, turning to the TASA corpus of 
texts used in the study reported by Graesser et al. (2011), only 89 out of 37,520 (0.24%) 
TASA texts are below the 30th percentile on all five of the Coh-Metrix components, and 
likewise, only 88 out of 37,520 (0.23%) TASA texts are above the 30th percentile on all 
five of the Coh-Metrix components. Hence, over 99% of the texts have at least one 
dimension that is below or above the 30th percentile. 

Thus, texts vary in difficulty along different dimensions. Coh-Metrix augments read- 
ability formulas by providing information about multiple sources of challenges and scaf- 
folds within texts. We can visualize this dance among the linguistic features of texts 
using the Coh-Metrix Text Easability Component Scores. 

Figure 2 provides the average easability scores for 3292 language arts texts and 2741 
science texts above grade level 8 (using Degrees of Reading Power) from the TASA 
corpus. As should be expected, the science texts are lower in narrativity compared to the 
language arts texts. High narrativity texts are characterized by a greater number of events 
and characters and a lower density of informational content. Low narrativity reflects the 
use of more challenging words and concepts and a greater density of information about 
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Figure 2. Coh-Metrix Text Easability Component Scores for science and language arts texts 
above DRP grade 8 from the TASA corpus. 


objects and ideas. Science texts also tend to have somewhat lower word concreteness 
because science concepts tend to be more abstract than are concepts in the language arts. 
Figure 2 further indicates that these challenges associated with low narrativity are offset 
by greater syntactic simplicity (e.g. shorter, less complex sentences) and higher referen- 
tial cohesion. That is, science texts are relatively high in both syntactic simplicity and 
referential cohesion. By contrast, the language arts texts have more syntactic challenges 
for the reader and include more cohesion gaps. These challenges are generally surmount- 
able for most readers, and may even make a narrative text more interesting and enjoya- 
ble. Indeed, highly cohesive stories with simple syntax are not generally considered to be 
among the greatest literary works. 

These patterns in Figure 2 comparing science and language arts texts correspond with 
findings reported by McNamara et al. (2012b), in which linguistic features were com- 
pared across text genres (science, history, and language arts) and an estimated grade level 
for the texts. The results of this study confirmed assumptions that science texts are com- 
posed of rare words, making it challenging for students to understand the concepts in the 
text. The results further indicated that the challenges of science texts are offset in various 
ways. Similar to the results depicted in Figure 2, science texts were found to have lower 
syntactic complexity and greater overlap in words and concepts (1.e. referential cohe- 
sion) in comparison to the language arts texts. In contrast, the latter tended to have lower 
referential and verb cohesion. Thus, they tend to have sentences that make sense to the 
reader, but have little explicit overlap in objects (nouns) or verbs (actions). By them- 
selves, these indices may imply that narratives are difficult to read. However, narratives 
also tend to be composed of more frequent, familiar words, and they often have high 
causal and temporal cohesion. This situation model cohesion allows the reader to form a 
coherent, mental model of the text’s contents. 

As discussed earlier, the text genres themselves have certain characteristics, by their 
very natures and purposes. For example, readers learn from science and other expository 
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texts, and thus the words and concepts are by necessity unfamiliar. Narratives are 
intended to weave a story in a fictional world, and thus the words and concepts will tend 
to be more familiar. There is certainly a great deal of variation within each of the genres. 
For example, some of the language arts texts in the TASA corpus are academic texts 
about literature, and some are excerpts from stories and other types of narrative text. 
Similarly, there will be variation among the different subtopics of science. Chemistry 
texts will not have the same profile as biology texts. Nonetheless, as a whole, text genres 
appear to have distinct profiles of linguistic characteristics. 

These profiles indicate that there is a give-and-take among the challenges within the 
texts. There are times when language in narrative texts at the situation model levels com- 
pensates for the more challenging sequences of sentences that might result from low 
overlap in words and concepts. In turn, while the reader is challenged by the unfamiliar- 
ity of the concepts in expository texts, processing is partially eased through simpler 
syntax and greater cohesion. Of course, there is only so much that can be done for a 
reader who is low in knowledge about a domain — and thus, despite these potential com- 
pensations within texts, expository texts are far more challenging than narrative texts. 


The author’s dance 


While the nature of the words and concepts are fundamental to genre, other characteris- 
tics of text are often rhetorical choices on the part of the author. An author tasked to write 
about cell mitosis must use certain words to convey the concepts associated with these 
biological processes. The nature of the words is at least partially driven by the purpose 
of the text (i.e. genre). By contrast, many features of the text such as the complexity of 
the sentences and the cohesion between the sentences are largely driven by the author. 
Whether sentences are long or short, whether causal and temporal relationships are 
explicitly expressed with connectives, whether one sentence overlaps with another, are 
examples of choices that the author makes during the writing process, or they emerge 
from the author’s writing style (and the editor’s). 

The profiles that we observe in texts indicate that authors do not make random choices, 
but rather that these choices must be driven by some awareness of the needs of the reader. 
Indeed, the compensatory patterns observed in texts point toward an epistemic stance on 
the part of writers, editors, and publishers. Epistemic stance is the expression of epis- 
temic relationships between interlocutors with regard to a domain, or their epistemic 
status (Heritage, 2013). 

Do writers have an epistemic stance toward their intended audience that results in a 
compensatory play among the linguistic and semantic characteristics of a text? Clearly, 
good writers have a stance toward their audience. This is not a novel idea. Indeed, com- 
position students are instructed in increasing awareness of their intended audience. 
Skilled writers more effectively judge the interests, knowledge levels, and even reading 
abilities of their audience. In turn, more skilled readers have greater awareness of 
authors’ intentions, and providing instruction to question the author enhances readers’ 
comprehension (Beck and McKeown, 2006). Hence, a writer’s intentions and stance to 
a reader are crucial to the comprehension process. Indeed, the profiles in Figure | indi- 
cate that texts that are published, such as those in the TASA corpus, are characterized by 
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more skilled writers who have an epistemic stance toward their readers. And this stance 
is observable in the Coh-Metrix component profiles. By contrast, there are also writers 
who gauge their audience less skillfully. These writers may be those who produce the 
incomprehensible expository texts or the tragically boring narratives. 

If it is the case that more skilled writers, that is, those who are more likely to have 
published texts, have greater audience awareness, how might this manifest for devel- 
oping writers? It is clear that cohesion is an important factor to consider in reading 
comprehension, and particularly for expository texts. Expository texts are intended for 
readers who are relatively low in knowledge in the sense that they are likely reading 
the text to learn information. Very often we have seen that writers (and textbook pub- 
lishers) slightly misjudge their audience in terms of their prior knowledge of the 
domain because studies show that increasing their cohesion improves comprehension 
(for a review, see McNamara et al., 2010a). Yet overall, science texts are written with 
higher cohesion than narratives, indicating that there is some awareness of the need to 
scaffold the knowledge-seeking reader. Or, on the other side of the coin, narratives and 
language arts texts are written with lower cohesion, indicative of some awareness that 
higher cohesion, or spelling everything out for the reader, makes for poor and boring 
literature. 

How does cohesion manifest in text for developing writers? Experts in the areas of 
English Language Arts and Composition have long assumed that cohesion is a crucial 
component of writing. Cohesive cues such as semantic overlap between sentences and 
paragraphs and connectives between sentences have been consistently pointed to as cru- 
cial components of skilled writing. There is a clear ‘sense’ that skilled writing is more 
coherent and better organized. However, for the most part, many conflate the terms cohe- 
sion and coherence. It is important to distinguish between the cues that are objectively 
observable in the text or discourse (i.e. cohesion) as compared to the connections that are 
formed in the mind of the reader or listener (i.e. coherence). Many composition research- 
ers assume that better writing is more coherent and that this coherence is grounded in 
cohesive cues in the writers’ text. Until recently, there have been more claims in this 
regard than actual research. And there have been few tools available to investigate the 
assumption. Coh-Metrix and other text analysis tools have provided us with the means 
over the past few years to investigate the role of cohesion and other linguistic features in 
essays produced by developing writers. 

In our research, we have focused on prompt-based essays that are used to assess writ- 
ing skill in high school exit examinations (or college entrance exams). These essays are 
generally time limited (the writer is given 25 minutes to complete the essays) and on 
relatively familiar topics, such as the significance of heroes and celebrities, or the value 
of choices in life. The writer is asked to take a position on a particular question and sup- 
port that position with evidence and examples. We focus on this genre of essay because 
we have developed a writing strategy tutoring system called the Writing Pal (McNamara 
et al., 2012b). This game-based intelligent tutoring system provides instruction in writ- 
ing strategies and essay writing practice. For the latter, we have developed automated 
essay scoring algorithms to provide feedback on the essays. Undergirding this develop- 
ment process has been research to examine the linguistic features characterizing good 
and poor essays. 
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We set out on this research agenda with a particular interest in the role of cohesion. 
On the one hand, it may be expected that cohesion would be positively related to the 
quality of the essay. Cohesion facilitates the ease of processing for the reader and thus 
better writers might be expected to provide more cohesive cues in their writing. On the 
other hand, better writers may take an epistemic stance toward their true reader, who is 
the person scoring the essay. As depicted in Figure 3, a more skilled writer may be more 
in tune with the probability that the person scoring the essay will be high in knowledge 
and a highly skilled reader. Given that the domain of prompt-based persuasive essays 
tends to be relatively familiar, then the characteristics of essays written by the more 
skilled essay writer ought to be more similar to the profiles observed for the language 
arts texts. Compared to essays written by less skilled writers, they should have greater 
syntactic complexity, more unfamiliar words, and lower cohesion. If the writer holds an 
epistemic stance that the reader has sufficient knowledge about the topic, the writing 
should be more sophisticated than that of less skilled writers. That is, the writer should 
use words that the reader can be expected to know and sentence structures that the reader 
can be expected to successfully parse. In addition, the reader can be expected to success- 
fully generate inferences to bridge cohesion gaps in the text. As such, the better essays 
may also exhibit fewer explicit cohesive cues. 

We have found these trends in several studies. In McNamara et al. (2010a), we exam- 
ined which linguistic features were most predictive of essay quality for 120 college stu- 
dent writers who wrote take home (untimed) essays. We found that better essays were 
more syntactically complex, had a greater diversity of words, and included more rare, 
unfamiliar words. By contrast, no measures of cohesion from Coh-Metrix correlated 
with essay quality. These results indicated that higher quality essays were more likely to 
contain linguistic features associated with text difficulty and sophisticated language. 
However, cohesion was unrelated to essay quality. 

In another study, Crossley et al. (2011b) examined differences between essays written 
by ninth-grade, eleventh-grade, and college students. Thus, this study examined differ- 
ences between the essays as a function of the development of the writer. As expected, the 
essays increased linearly in quality as a function of grade level of the writer. Fortunately, 
the college students wrote higher quality essays than the eleventh-grade students, who in 
turn wrote better essays than the ninth-grade students. The ninth-grade essays were char- 
acterized by higher word frequency (i.e. more familiar words) and lower syntactic com- 
plexity (i.e. simple sentences). Similar to McNamara et al. (2010b), the higher quality 


Scorer/Reader is Write Text that has: 


Essay Writer’s te Complex Syntax 
Epistemic Stance Rare Words 
sue al Low Cohesion 
Skilled Reader 


Figure 3. The successful developing writer’s epistemic stance toward their target reader, the 
scorer of the essay. 
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essays were more syntactically complex, had a greater diversity of words, and included 
more rare, unfamiliar words as a function of grade level. In addition, cohesion decreased 
as a function of grade level. The ninth-grade essays included more explicit cohesive cues 
such as connectives and word overlap, whereas the college student essays included the 
least cohesive cues (see also Crossley et al., 201 1a). 

Hence, writers are aware of and able to use cohesive cues in the writing early in their 
development. Indeed, research indicates that children learn and use cohesive devices in 
their writing as early as grade 2 and continue developing in their use at least until around 
grade 8 (King and Rentel, 1979; McCutchen, 1986; McCutchen and Perfetti, 1982). 
After approximately grade 9, however, it appears that the use of these cues decreases as 
writers become more proficient (see also Freedman and Pringle, 1980). At the same time, 
they learn and are able to use more sophisticated language such as rare words, more 
diverse words, and more complex syntax. The decrease in the use of explicit cohesive 
cues indicates that skilled writers increase in their awareness of when these cues are 
needed to support comprehension. Such awareness requires an increased understanding 
of the intended audience, the reader of the essay. 


Conclusion 


Cohesion plays an important function in text. To better understand the role of cohesion 
in text, we need to consider multiple factors. First, there is the text. Second, within the 
text, the level of cohesion in the text is only meaningful relative to other features of the 
text. All of the dimensions ofa text interact, each one depending on the other. Considering 
one aspect of text in isolation is much like viewing a Monet from a foot away. 
Considering multiple aspects of text is analogous to backing away from the dabs of 
color in an impressionist painting to see the full picture. 

A third consideration is the reader. The reader has varying levels of knowledge, read- 
ing ability, interests, and goals, which in turn interact with the various dimensions and 
characteristics of the text. A fourth consideration is the author. The author creates the 
text. To the extent that the author understands the characteristics of the reader, there can 
be an epistemic relationship between the two, and the author has a more or less success- 
ful epistemic stance toward the reader. Published texts certainly vary in their quality, 
appeal to readers, and success in engaging readers. On the whole, however, we might 
assume that authors of published texts have some sense of the reader. In these texts, we 
observe compensatory patterns between sources of difficulty in text that imply that 
authors have some sense of the interplay between the sources of challenges in text with 
relation to their reader. By contrast, less skilled writers are less aware of their audience 
and also show less evidence of epistemic stance within their writing. 


Acknowledgments 


The author is grateful to many who have contributed to this work across the years including Art 
Graesser, Jonna Kulikowich, Max Louwerse, Randy Floyd, Loel Kim, Phil McCarthy, Zhiqiang 
Cai, Jianmin Dai, and Vasile Rus. Many students have worked on this project, including current 
graduate students Jennifer Weston, Laura Varner, Erica Snow, and Russell Brandon. The author is 
particularly grateful to Laura Varner, who contributed to parts of this manuscript. 


Downloaded from dis.sagepub.com at ARIZONA STATE UNIV on September 12, 2013 


McNamara 15 


Funding 


The research reported here was supported by the Institute of Education Sciences, US Department 
of Education, through Grants R305A 120707 and R305A090623 to Arizona State University, and 
Grants R305A080589 and R305G020018-02 to the University of Memphis. The opinions 
expressed are those of the author and do not represent views of the Institute or the US Department 
of Education. 


References 


Alexander PA, Kulikowich JM and Schulze SK (1994) How subject-matter knowledge affects 
recall and interest on the comprehension of scientific exposition. American Educational 
Research Journal 31: 313-337. 

Beck IL and McKeown MG (2006) Improving Comprehension with Questioning the Author: A 
Fresh and Expanded View of a Powerful Approach. New York: Scholastic. 

Beck IL, McKeown MG, Omanson RC and Pople MT (1984) Improving the comprehensibility 
of stories: The effects of revisions that improve coherence. Reading Research Quarterly 19: 
263-277. 

Beck IL, McKeown MG, Sinatra GM and Loxterman JA (1991) Revising social studies text from 
a text-processing perspective: Evidence of improved comprehensibility. Reading Research 
Quarterly 26: 251-276. 

Bransford JD and Johnson MK (1972) Contextual prerequisites for understanding: Some inves- 
tigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior 11: 
717-726. 

Britton BK and Gulgoz S (1991) Using Kintsch’s computational model to improve instruc- 
tional text: Effects of repairing inference calls on recall and cognitive structures. Journal of 
Educational Psychology 83: 329-345. 

Chiesi HL, Spilich GJ and Voss JF (1979) Acquisition of domain-related information in relation 
to high and low domain knowledge. Journal of Verbal Learning and Verbal Behavior 18: 
257-273. 

Crossley SA, Roscoe R, Graesser A and McNamara DS (201 1a) Predicting human scores of essay 
quality using computational indices of linguistic and textual features. In: Biswas G, Bull S, 
Kay J and Mitrovic A (eds) Proceedings of the 15th International Conference on Artificial 
Intelligence in Education. Auckland, New Zealand: AIED, pp. 438-440. 

Crossley SA, Weston J, McLain-Sullivan ST and McNamara DS (2011b) The development of 
writing proficiency as a function of grade level: A linguistic analysis. Written Communication 
28: 282-311. 

Flesch R (1948) A new readability yardstick. Journal of Applied Psychology 32: 221-233. 

Freedman A and Pringle I (1980) Writing in the college years: Some indices of growth. College 
Composition and Communication 31: 311-324. 

Graesser AC and McNamara DS (2011) Computational analyses of multilevel discourse 
comprehension. Topics in Cognitive Science 2: 371-398. 

Graesser AC, McNamara DS and Kulikowich JM (2011) Coh-Metrix: Providing multilevel 
analyses of text characteristics. Educational Researcher 40: 223-234. 

Graesser AC, Singer M and Trabasso T (1994) Constructing inferences during narrative text 
comprehension. Psychological Review 101: 371-395. 

Haenggi D and Perfetti C (1994) Processing components of college-level reading comprehension. 
Discourse Processes 17: 83-104. 

Haviland SE and Clark HH (1974) What’s new? Acquiring new information as a process in 
comprehension. Journal of Verbal Learning and Verbal Behavior 13: 512-521. 


Downloaded from dis.sagepub.com at ARIZONA STATE UNIV on September 12, 2013 


16 Discourse Studies | 5(5) 


Heritage J (2013) Epistemics. Discourse Studies 15(5): xx—xx. 

King M and Rentel V (1979) Toward a theory of early writing development. Research in the 
Teaching of English 13: 243-253. 

Kintsch W (1988) The use of knowledge in discourse processing: A construction-integration 
model. Psychological Review 95: 163-182. 

Kintsch W (1998) Comprehension: A Paradigm for Cognition. Cambridge, MA: Cambridge 
University Press. 

Kintsch W and Keenan J (1973) Reading rate and retention as a function of the number of proposi- 
tions in the base structure of sentences. Cognitive Psychology 5: 257-274. 

Kintsch W and van Dijk TA (1978) Toward a model of text comprehension and production. 
Psychological Review 85: 363-394. 

Kintsch W, Kozminsky E, Streby WJ, et al. (1975) Comprehension and recall of text as a function 
of content variables. Journal of Verbal Learning and Verbal Behavior 14: 196-214. 

Klare GR (1974-1975) Assessing readability. Reading Research Quarterly 10: 62-102. 

Koslin BL, Zeno S and Koslin S (1987) The DRP: An Effective Measure in Reading. New York: 
College Entrance Examination Board. 

McCutchen D (1986) Domain knowledge and linguistic knowledge in the development of writing 
ability. Journal of Memory and Language 25: 431-444. 

McCutchen D and Perfetti C (1982) Coherence and connectedness in the development of discourse 
production. Text 2: 113-139. 

McKoon G and Ratcliff R (1992) Inference during reading. Psychological Review 99: 440-466. 

McNamara DS and Graesser AC (2012) Coh-Metrix: An automated tool for theoretical and 
applied natural language processing. In: McCarthy PM and Boonthum-Denecke C (eds) 
Applied Natural Language Processing and Content Analysis: Identification, Investigation, and 
Resolution. Hershey, PA: IGI Global, pp. 188-205. 

McNamara DS and Kintsch W (1996) Learning from texts: Effects of prior knowledge and text 

coherence. Discourse Processes 22: 247-288. 

McNamara DS and McDaniel MA (2004) Suppressing irrelevant information: Knowledge activa- 

tion or inhibition. Journal of Experimental Psychology: Learning, Memory and Cognition 30: 

465-482. 

McNamara DS and Magliano JP (2009) Self-explanation and metacognition: The dynamics of 

reading. In: Hacker JD, Dunlosky J and Graesser AC (eds) Handbook of Metacognition in 

Education. Mahwah, NJ: Erlbaum, pp. 60-81. 

McNamara DS and O’Reilly T (2009) Theories of comprehension skill: Knowledge and strategies 

versus capacity and suppression. In: Columbus AM (ed.) Advances in Psychology Research, 

62. Hauppauge, NY: Nova Science Publishers, Inc., pp. 113-136. 

McNamara DS, Crossley SA and McCarthy PM (2010a) Linguistic features of writing quality. 

Written Communication 27: 57-86. 

McNamara DS, Graesser AC and Louwerse MM (2012a) Sources of text difficulty: Across the 
ages and genres. In: Sabatini JP and Albro E (eds) Assessing Reading in the 21 Century: 
Aligning and Applying Advances in the Reading and Measurement Sciences. Lanham, MD: 
R&L Education, pp. 89-116. 

McNamara DS, Kintsch E, Songer NB and Kintsch W (1996) Are good texts always better? 
Interactions of text coherence, background knowledge, and levels of understanding in learning 
from text. Cognition and Instruction 14: 1-43. 

McNamara DS, Louwerse MM, McCarthy PM and Graesser AC (2010b) Coh-Metrix: Capturing 
linguistic features of cohesion. Discourse Processes 47: 292-330. 

McNamara DS, Raine R, Roscoe R, et al. (2012b) The Writing-Pal: Natural language algo- 
rithms to support intelligent tutoring on writing strategies. In: McCarthy PM and 


Downloaded from dis.sagepub.com at ARIZONA STATE UNIV on September 12, 2013 


McNamara 17 


Boonthum-Denecke C (eds) Applied Natural Language Processing and Content Analysis: 
Identification, Investigation, and Resolution. Hershey, PA: IGI Global, pp. 298-311. 

Miller JR and Kintsch W (1980) Readability and recall of short prose passages: A theoretical 
analysis. Journal of Experimental Psychology: Human Learning and Memory 6: 335-354. 

O’Reilly T and McNamara DS (2007) Reversing the reverse cohesion effect: Good texts can be 
better for strategic, high-knowledge readers. Discourse Processes 43: 121-152. 

Perfetti C (1989) There are generalized abilities and one of them is reading. In: Resnick LB 
(ed.) Knowing, Learning, and Instruction: Essays in Honor of Robert Glaser. Hillsdale, NJ: 
Lawrence Erlbaum Associates, Inc., pp. 307-336. 

van Dijk TA and Kintsch W (1983) Strategies of Discourse Comprehension. New York: Academic 
Press. 

Vidal-Abarca E, Martinez G and Gilabert R (2000) Two procedures to improve instructional text: 
Effects on memory and learning. Journal of Educational Psychology 92: 107-116. 

Wolfe MBW and Goldman SR (2005) Relations between adolescents’ text processing and 
reasoning. Cognition and Instruction 23: 467-502. 


Author biography 


Danielle S McNamara is a Professor in the Psychology Department and Senior Scientist in the 
Learning Sciences Institute at Arizona State University. Her academic background includes a 
Linguistics BA (1982), a Clinical Psychology MS (1989), and a PhD in Cognitive Psychology 
(1992; UC-Boulder). Her research involves the theoretical study of cognitive and discourse pro- 
cesses as well as the application of the learning sciences to educational practice. The overarching 
theme of her research is to better understand cognitive and motivational processes involved in 
reading, writing, memory, and knowledge acquisition and to apply that understanding to educa- 
tional practice by creating and testing educational technologies (e.g. Coh-Metrix, the iSTART 
Reading Strategy tutor, Writing Pal the writing strategy tutor). 


Downloaded from dis.sagepub.com at ARIZONA STATE UNIV on September 12, 2013 


